NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gradient-based Analysis of NLP Models is Manipulable

https://doi.org/10.18653/v1/2020.findings-emnlp.24

Wang, Junlin; Tuyls, Jens; Wallace, Eric; Singh, Sameer (January 2020, Findings of the Association for Computational Linguistics: EMNLP 2020)

Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, the fact that they directly reflect the model internals. In this paper, however, we demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses. In particular, we merge the layers of a target model with a Facade Model that overwhelms the gradients without affecting the predictions. This Facade Model can be trained to have gradients that are misleading and irrelevant to the task, such as focusing only on the stop words in the input. On a variety of NLP tasks (sentiment analysis, NLI, and QA), we show that the merged model effectively fools different analysis tools: saliency maps differ significantly from the original model’s, input reduction keeps more irrelevant input tokens, and adversarial perturbations identify unimportant tokens as being highly important.
more » « less
Full Text Available
Generative Modeling of Atmospheric Convection

https://doi.org/10.1145/3429309.3429324

Mooers, Griffin; Tuyls, Jens; Mandt, Stephan; Pritchard, Mike; Beucler, Tom G (January 2020, CI2020)
null (Ed.)
While cloud-resolving models can explicitly simulate the details of small-scale storm formation and morphology, these details are often ignored by climate models for lack of computational resources. Here, we explore the potential of generative modeling to cheaply recreate small-scale storms by designing and implementing a Variational Autoencoder (VAE) that performs structural replication, dimension- ality reduction, and clustering of high-resolution vertical velocity fields. Trained on ∼ 6 · 106 samples spanning the globe, the VAE successfully reconstructs the spatial structure of convection, per- forms unsupervised clustering of convective organization regimes, and identifies anomalous storm activity, confirming the potential of generative modeling to power stochastic parameterizations of convection in climate models.
more » « less
Full Text Available
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

https://doi.org/10.18653/v1/D19-3002

Wallace, Eric; Tuyls, Jens; Wang, Junlin; Subramanian, Sanjay; Gardner, Matt; Singh, Sameer (October 2019, Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations)

Neural NLP models are increasingly accurate but are imperfect and opaque—they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders adoption for practitioners and burdens interpretability researchers. We introduce AllenNLP Interpret, a flexible framework for interpreting NLP models. The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit’s flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and adversarial attacks) on a variety of models and tasks (e.g., masked language modeling using BERT and reading comprehension using BiDAF). These demos, alongside our code and tutorials, are available at https://allennlp.org/interpret.
more » « less
Full Text Available

Search for: All records